User:DarTar/Hypothes.is Reputation Workshop

This page in a nutshell: A report from the Hypothes.is Reputation Workshop which I attended on February 22-24, 2012

Hypothes.is is a non-profit organization whose aim is to implement an open source peer-review system allowing Web users to:

annotate a broad range of online resources (documents, images, movies, datasets, media in general)
filter and rank annotations to help other users assess the credibility of these resources

They successfully raised initial funding via a Kickstarter project and received additional financial support by the Sloan Foundation. The Electronic Frontier Foundation is one of their endorsers with John Perry Barlow as one of their board members.

Hypothes.is founder Dan Whaley invited us to participate in the workshop to contribute to the design of this service with input from Wikimedia and to help them make design decisions for a suitable reputation model to be used by the service. Prominent Wikipedia researchers (GroupLens founder John Riedl, and WikiTrust creator Luca de Alfaro) were among the participants.

About Hypothes.is

From the hypothes.is website:

Hypothes.is will be a distributed, open-source platform for the collaborative evaluation of information. It will enable sentence-level critique of written words combined with a sophisticated yet easy-to-use model of community peer-review. It will work as an overlay on top of any stable content, including news, blogs, scientific articles, books, terms of service, ballot initiatives, legislation and regulations, software code and more-without requiring participation of the underlying site. It is based on a new draft standard for annotating digital documents currently being developed by the Open Annotation Collaboration, a consortium that includes the Internet Archive, NISO (National Information Standards Organization), O'Reilly Books, Amazon, Barnes and Noble, and a number of academic institutions.

Purpose of the workshop

The workshop brought together academic and industry experts in online reputation systems (from organizations/companies such as eBay, Reddit, Slashdot, StackOverflow) as well as organizations developing technology and standards for collaborative annotation (e.g. Mendeley, AnnotateIt, Open Annotation Collaboration), large-scale content providers (Wikimedia, the Public Library of Science) and data preservation organizations (Internet Archive). The purpose of the workshop was to:

identify use cases and an ideal domain for the initial launch of the service
discuss pain points in the design of an open annotation system
make design decisions about the reputation and user identity model to be used by the service
gather requirements from content providers that may support Hypothes.is

The agenda of the workshop, including links to the etherpads used in the different sessions, is available here

Wikimedia and Hypothes.is

WMF has a number of strategic reasons to participate in this initiative and help shape the design of its technology from the onset:

Some of the Foundation's in-house projects (such as Article Feedback) aim to implement features to allow editors and readers to attach granular annotations to Wikipedia articles, that are very close to Hypothes.is core functionality.
Many quality-related inline templates designed by the community (such as CN, CN-Span or POV-statement) provide a rudimentary annotation system similar to the one Hypothes.is aims to design.
Our ImageAnnotator gadget provides very basic support for image annotation, which is one of the use cases Hypothes.is will be built for. Support for annotations of other media types in MediaWiki is very sketchy or non existent.
Collaborative credibility and quality assessment mechanisms are at the core of the functioning of our community and projects, but they have no support in the product.
Open annotation as a potential on-ramp for new contributors is a notion we're actively exploring with Article Feedback.
The vision behind Hypothes.is and its commitment to open source technology and open licensed contents is aligned with Wikimedia's mission to support open knowledge.

Highlights from the workshop

The workshop was an exceptional opportunity to inform the design of Hypothes.is with use cases relevant to Wikimedia and to engage in conversations about Wikipedia with experts of online reputation, online identity, moderation and meta-moderation. Below is a summary of the main outcomes that (more or less directly) involve Wikimedia projects.

Wikipedia as a proof-of-concept use case

The organizers put a lot of effort in identifying a set of use cases to allow a soft launch of Hypothes.is. The domain for a soft launch should be sufficiently constrained to produce a compelling proof-of-concept. Other conditions that were discussed for an ideal soft launch use case include:

the persistence of the resources to be annotated (or in the case of versioned resources, the ability to uniquely identify resource revisions in a persistent way)
the possibility to use a consistent set of fragments as anchors for user annotations
the public accessibility of the resources (which excluded cases like paywalled articles from online newspapers or from the scholarly literature)
the existence of a large population of Web users already visiting the same resources
the existence of natural incentives for users to start annotating and assessing these resources (to mitigate the coldstart problem)
some expectation that Hypothes.is, when applied to these resources, may help improve their quality or their public understanding

Two use cases stood out as meeting the above requirements:

The collaborative annotation of Wikipedia articles
The collaborative annotation of bills (e.g. SOPA, PIPA, ACTA, RWA)

Supporting the editorial work of Wikipedia contributors by allowing a large population of readers to annotate/flag articles for specific issues emerged as a good example for a potential soft launch. Hypothes.is would be able to implement such an annotation system on top of Wikipedia without any formal support or commitment of resources from WMF but would love to have input from the community and the Foundation on how to make this tool useful for our contributors.

Annotation and versioning

The use cases mentioned above helped us focus on two distinct functions that Hypothes.is could support:

Surface issues and themes from the collaborative annotation of a static resource (e.g. help citizens identify the sections of a bill that were pushed by a powerful lobby)
Use annotations to suggest possible revisions to the author(s) of an original resource

The second use case was inspired by Wikipedia, by the notion of open peer review in scholarly communication or "citizen journalism" and by the model of "pull requests" in distributed revision control systems. In the case of Wikipedia, an example of workflow for the resolution and integration of an annotation into a document could be the following:

User A annotates a fragment (word, sentence, or section) of a Wikipedia article with an annotation (e.g. Here is a reference in support of this unsourced statement)
Editor B finds the annotation useful and integrates the annotation into the text of the article by committing a revision
The suggestion is marked as resolved and linked to the revision of the article
User A is notified of the successful integration of her suggestion
User A reputation score is increased as a result of the successfully converted annotation

The idea of building the concept of a "resolved issue" into Hypothes.is, tying it to a particular version of the master document and giving credit to the author(s) of the annotation – came out as a potential requirement to support the above use case. The model is similar to the one the Foundation is planning to experiment with in the context of author notifications for featured/promoted Article Feedback posts.

Moderation strategies

Several presentations and discussions tackled the problem of how to use reputation for the purpose of moderation (assessing the value of user contributions) and meta-moderation (assessing the fairness and accuracy of moderation acts). These problems are similar to those that we're facing in the context of Article Feedback's Feedback Page, MoodBar, New Page Triage.

Erik Martin (Reddit's general manager) and Jeff Atwood (StackOverflow co-founder) gave an overview of how they use reputation or karma to unlock specific privileges or to make the achievement of these privileges conditional on performing a number of desirable actions.

The Slashdot approach to moderation was extensively discussed as it combines user privileges (only some users can moderate) with the notion that moderation (and meta-moderation) should be blind and random in order to be effective (only once in a while users get "moderation credits" that they can spend). Paul Resnick showed that a basic principle to build robust (meta-)moderation systems is to make them blind (the moderator doesn't know the identity of the contributor, but only judges the value of the contribution) and random (the moderator doesn't get a chance to select which items to evaluate). The solution implemented by websites such as KittenWar or AllOurIdeas (using random blind pairwise comparison) was discussed as an excellent implementation of these principles to make the (meta-)moderation process fair, game-resistant, and scalable.

Identity management

Two sessions tackled the problem of what user identity model should be used by Hypothes.is. Paul Resnick, Drummond Reed and Kaliya Hamlin presented solutions to the problem of controlling sybil attacks or identity usurpation without compromising on the notion of pseudonymity. Kaliya gave an overview of the spectrum of possible identity models (from full anonymity to pseudonymity to individually asserted identity, to socially validated identity to verified identity) and discussed the pros and cons of each solution. A particularly hot topic was the notion of "cheap pseudonyms" and the fact that lowering the costs of the creation of new personas can easily threaten the function of reputation systems. Paul Resnick summarized these principles as follows:

there should be incentives for newcomers to obtain a positive reputation
allocating an initial trust by default to every new user is an easy opportunity for abuse
new users have to "pay their due" in order to prove their value to the community
there must be mechanisms in place to make it unattractive for a user to start over with a new identity

Paul Resnick proposed the idea of "expensive pseudonyms" via an identity-issuing authority that could release "once-in-a-lifetime" pseudonyms and protect the real identity of individuals while making it costly for them to forge multiple identities. Kaliya Hamlin, on the other hand, stressed the importance of "Limited Liability Personas" to allow users to keep multiple personas (and their attached reputation values) in different contexts, and allow them to dump a single persona if for any reason it got screwed up (without affecting the reputation of other personas). Jeff Atwood had some compelling stories on the mistake they made at StackOverflow when they allowed users to rename their personas.

Reputation and participation

Jeff Atwood explained how the main design principle behind the StackOverflow reputation system was to sustain early participation and allow newcomers to earn credibility and progressively become part of the core community. The fact that StackOverflow reputation turned out to be portable (e.g. if I am an expert in StackOverflow I can now link to my StackOverflow reputation in my cv and capitalize on it to apply for a new job) was a happy but unintended consequence of this design. The use of badges was also critical for two purposes:

get the community to spontaneously discover new functionality without the need of reading boring FAQ (interesting!)
incentivize a sufficient number of users to take up specific governance roles

Luca de Alfaro gave an overview of his work on WikiTrust and suggested that at the time it was first proposed the reception by the Wikimedia community was very cold. We discussed the problem of measuring editor reputation based on edit survival, which is only useful in areas of Wikipedia in which edits can be actually modified or reverted by other users (e.g. in the Article namespace, but less so in other namespaces) and doesn't adequately capture other valuable roles for the community that can be measured by this metric.

Content-driven vs pluggable reputation

An interesting dichotomy emerged between two opposite approaches to reputation that Hypothes.is could build on:

Make reputation entirely content-driven (i.e. allow users to gain reputation by measuring how the user community as a whole values their contributions) and build robust moderation/meta-moderation mechanisms to control for gaming
Make Hypothes.is reputation agnostic by allowing external reputation systems to be plugged into it and letting individual users decide what "reputation goggles" to use in order to filter annotations

The two approaches were exemplified with compelling use cases that could successfully implement their respective reputation model.

Secondary outcomes

The following were some of the unintended outcomes of my participation in the workshop, among many other valuable discussions I had with the participants:

A work session with John Riedl on how to better integrate GroupLens research and Wikimedia's product development research.
A discussion with Jeff Atwood to distill lessons learned in the context of StackOverflow (in the area of reputation, user engagement and moderation) that Wikimedia may benefit from.
Several discussions with people from Mendeley, the Public Library of Science, the altmetrics.org movement on the role that Wikipedia could play in making the scholarly communication system more open and what technology may support this effort.
A meeting with Rob Sanderson, from the Open Annotation Collaboration project, to discuss Memento and its possible role in Wikipedia. Memento is a project allowing clients to negotiate resources in the state they were at at a particular point in time via a dedicated Accept_Datetime HTTP header. They developed a MediaWiki extension that can honor time-based HTTP requests and serve the closest matching revision of an article. There is now an open request in bugzilla to review and deploy this extension in production.
A discussion with ReadWriteWeb writer Jon Mitchell about the potential of "remote calls-to-action" or the idea of embedding invitations to contribute to Wikipedia into those sources where we are more likely to find experts and potential new contributors on specific technical topics.

References

The Workshop

Complete list/bios of workshop attendees
The official reading list of the workshop
The agenda of the workshop

Other reports

Jon Mitchell, How We're Going to Fix Online Identity and Reputation, ReadWriteWeb
Heather Ford Online reputation: it’s contextual Ethnography Matters

Orgs/projects

AnnotateIt - an open source project of the Open Knowledge Foundation, which forms the main codebase for Hypothes.is
PieTrust - a pluggable online reputation system
Open Annotation Collaboration - Web annotation meets Linked Data
Connect.me - a reputation platform whose users endorse/vouch for each other
MetaCurrency - a p2p project to support emergent currency systems
Internet Identity Workshop - workgroup creating tools and standards for user-centered identity management